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Abstract 

| Recently, a number of authors have proposed decoding schemes for Reed-Solomon (RS) codes based on multiple trials of 

a simple RS decoding algorithm. In this paper, we present a rate-distortion (R-D) approach to analyze these multiple-decoding 
algorithms for RS codes. This approach is first used to understand the asymptotic performance-versus-complexity trade-off of 
multiple error-and-erasure decoding of RS codes. By defining an appropriate distortion measure between an error pattern and 
an erasure pattern, the condition for a single error-and-erasure decoding to succeed reduces to a form where the distortion 
is compared to a fixed threshold. Finding the best set of erasure patterns for multiple decoding trials then turns out to be 
a covering problem which can be solved asymptotically by rate-distortion theory. Next, this approach is extended to analyze 
multiple algebraic soft-decision (ASD) decoding of RS codes. Both analytical and numerical computations of the R-D functions 
' for the corresponding distortion measures are discussed. Simulation results show that proposed algorithms using this approach 

, perform better than other algorithms with the same complexity. 

I. Introduction 

Reed-Solomon (RS) codes are one of the most widely used error-correcting codes in digital communication and data 
O storage systems. This is primarily due to the fact that RS codes are maximum distance separable (MDS) codes, can correct 
long bursts of errors, and have efficient hard-decision decoding (HDD) algorithms, such as the Berlekamp-Massey (BM) 
7-H algorithm, which can correct up to half the minimum distance (d m j„) of the code. An (n,k) RS code of length n and 
dimension k is known to have d,„j„ =n—k+l due to its MDS nature. 

Since the arrival of RS codes, people have put a considerable effort into improving the decoding performance at the expense 
of complexity. A breakthrough result of Guruswami and Sudan (GS) introduces a hard-decision list-decoding algorithm based 
on algebraic bivariate interpolation and factorization techniques that can correct errors beyond half the minimum distance 
of the code [1]. Nevertheless, HDD algorithms do not fully exploit the information provided by the channel output. Koetter 
and Vardy (KV) later extended the GS decoder to an algebraic soft-decision (ASD) decoding algorithm by converting the 
probabilities observed at the channel output into algebraic interpolation conditions in terms of a multiplicity matrix [2]. 
Both of these algorithms however have significant computational complexity. Thus, multiple runs of error-and-erasure and 
error-only decoding with some low complexity algorithm, such as the BM algorithm, has renewed the interest of researchers. 
These algorithms essentially first construct a set of either erasure patterns [3], [4], test patterns [5], or patterns combining 
both [6] and then attempt to decode using each pattern. There has also been recent interest in lowering the complexity per 
^ , decoding trial as can be seen in [7], [8], [9]. 

In the scope of multiple error-and-erasure decoding, there have been several algorithms using different sets of erasure 
patterns. After multiple decoding trials, these algorithms produce a list of candidate codewords and then pick the best 
codeword on this list, whose size is usually small. The nature of multiple error-and-erasure decoding is to erase some of 
the least reliable symbols since those symbols are more prone to be erroneous. The first algorithm of this type is called 
Generalized Minimum Distance (GMD) [3] and it repeats error-and-erasure decoding while successively erasing an even 
number of the least reliable positions (LRPs) (assuming that d m i n is odd). More recent work by Lee and Kumar [4] proposes 
a soft-information successive (multiple) error-and-erasure decoding (SED) that achieves better performance but also increases 
the number of decoding attempts. Literally, the Lee-Kumar's SED(/,/) algorithm runs multiple error-and-erasure decoding 
trials with every combination of an even number < / of erasures within the I LRPs. 

A natural question that arises is how to construct the "best" set of erasure patterns for multiple error-and-erasure decoding. 
Inspired by this, we first design a rate-distortion framework to analyze the asymptotic trade-off between performance and 
complexity of multiple error-and-erasure decoding of RS codes. The framework is also extended to analyze multiple algebraic 
soft-decision decoding (ASD). Next, we proposed a group of multiple-decoding algorithms based on this approach that achieve 
better performance-versus-complexity trade-off than other algorithms. The multiple-decoding algorithm that achieves the best 
trade-off turns out to be a multiple error-only decoding using the set of patterns generated by random codes combining with 
covering codes. These are the main results of this paper. 
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A. Outline of the paper 

The paper is organized as follows. In Section HJ we design an appropriate distortion measure and present a rate-distortion 
framework to analyze the performance-versus-complexity trade-off of multiple error-and-erasure decoding of RS codes. Also 
in this section, we propose a general multiple-decoding algorithm that can be applied to error-and-erasure decoding. Then, 
in Section [Till we discuss a numerical computation of R-D function which is needed for the proposed algorithm. In Section 
IIV1 we analyze both bit-level and symbol-level ASD decoding and design distortion measures so that they can fit into the 
general algorithm. In Section [V] we offer some extensions that help the algorithm achieve better performance and running 
time. Simulation results are presented in Section [VT1 and finally, conclusion is provided in Section IVlIl 



II. Multiple Error-and-Erasure Decoding 

In this section, we set up a rate-distortion framework to analyze multiple attempts of conventional hard decision error- 
and-erasure decoding. 

Let ¥21 be the Galois field with 2 q elements denoted as ai,0ta,,---,Oa<i- We consider an (n,k) RS code of length n, 
dimension k over F2?. Assume that we transmit a codeword c = (ci,C2,...,c„) G ¥\ q over some channel and receive a 
vector r = (n,^, . . . ,r„) G y" where y is the receive alphabet for a single RS symbol. In this paper, we assume that y = M. q 
and all simulations are based on transmitting each of the q bits in a symbol using Binary Phase-Shift Keying (BPSK) on 
an Additive White Gaussian Noise (AWGN) channel. For each codeword index i, let TTj : {1,2, . . . ,2 9 } — > {1,2, .. . ,2 q } be 
the permutation given by sorting ptj = Pr(c; = Of/|r) in decreasing order so that Pi^m > Pi,Xi(2) — ••• — Pi,7Ci(2i)- Then, we 
can specify y,-j = OWy) as the j-th most reliable symbol for j = 1,...,2 9 at codeword index i. To obtain the reliability 
of the codeword positions (indices), we construct the permutation a : {1,2, ...,«} — » {1,2, ...,«} given by sorting the 
probabilities Pi^ci) °f the most likely symbols in increasing order. Thus, codeword position a(z') is the z'-th LRP. These 
above notations will be used throughout this paper. 

Example 1: Consider n = 3 and q = 2. Assume that we have the probability written in a matrix form as follows 
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then 7ii(l,2,3,4) = (2,3,4,1)^2(1,2,3,4) = (3,4,2, 1), ^(1,2,3,4) = (1,2,4,3) and ff(l,2,3) = (2,3,1). 
Condition 1: (Classical decoding threshold, see [10], [11]): If e symbols are erased, a conventional hard-decision error- 
and-erasure decoder such as the BM algorithm is able to correct v errors in unerased positions if 



2v- 



< n-k+l. 
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A. Conventional error and erasure patterns. 

Definition 1: (Conventional error and erasure patterns) We define x" G Kj = {0, 1}" and x" G Zi> as an error pattern and an 
erasure pattern respectively, where jc, = means that an error occurs (i.e. the most likely symbol is incorrect) and x, = means 
that an erasure occurs at index i. 

Example 2: If d m in is odd then {111111. ..,00 1111..., 0000 11...,...} is the set of erasure patterns for the GMD algo- 
rithm. For the SED(3,2) algorithm, the set of erasure patterns has the form {1 1 1 1 1 1 ... ,001 1 1 1 . . . ,0101 11 ... , 1001 11...}. 
Here, in each erasure pattern the letters are written in increasing reliability order of the codeword positions. 
Let us revisit the question how to construct the best set of erasure patterns for multiple error-and-erasure decoding. First, 
it can be seen that a multiple error-and-erasure decoding succeeds if the condition ([TJ) is satisfied during at least one round 
of decoding. Thus, our approach is to design a distortion measure that converts the condition ([TJ into a form where the 
distortion between an error pattern x" and an erasure pattern x", denoted as d(x",x"), is less than a fixed threshold. 

Definition 2: Given a letter-by-letter distortion measure 8, the distortion between an error pattern x" and an erasure 
pattern x" is defined by 

d{x n ,x n ) = Y,8{x u Xi). 

Proposition 1: If we choose the letter-by-letter distortion measure 8 : Z2 x Z2 — > M>o as follows 

5(0,0) = 1 5(0,1) =2 

5(1,0) = 1 5(1,1) =0 K} 

then the condition (fl3 for a successful error-and-erasure decoding then reduces to the form where the distortion is less than 
a fixed threshold 

d(x'\x") <n-k+l. 



Proof: First, we define Xs,i — \{i G {1,2, ...,«}: x,- = s,JE/ =?}| to count the number of (x,-,x,-) pairs equal to (s,t) for 
every s,t S {0, 1}. Noticing that e = ^o,o + Xi,o an d V = Jo.i, me condition (fl} for one error-and-erasure decoding attempt 
to succeed becomes 2xo,i +Z0.0 + Zi,o <n — k+\. By seeing that d(x n ,x") = 2xo,i + X0.0 + X1.0 we conclude the proof. ■ 

Next, we try to maximize the chance that this successful decoding condition is satisfied by at least one of the decoding 
attempts (i.e. d(x",x") < n — k+ 1 for at least one erasure patterns x"). Mathematically, we want to build a set B of no more 
than 2 R erasure patterns x" in order to 



max Pr{min<i(x",i") < n -k+ 1}. 
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The exact answer to this problem is difficult to find. However, one can see it as a covering problem where one wants to 
cover the space of error patterns using a minimum number of balls centered at the chosen erasure patterns. 

This view leads to an asymptotic solution of the problem based on rate-distortion theory. More precisely, we view the error 
pattern x" as a source sequence and the erasure pattern x" as a reproduction sequence. 
Rate-distortion theory shows that the set B of 2 R reproduction sequences 
can be generated randomly so that 

lim£> ee [c/(x",x")] <D 

where the distortion D is minimized for a given rate R. Thus, for large 
enough n, we have 

mmd(x n ,x n ) <D 

x"eB 




Error pattern 
Erasure pattern 



with high probability. Here, R and D are closely related to the complexity and the performance, respectively, of the decoding 
algorithm. Therefore, we characterize the trade-off between those two aspects using the relationship between R and D. 



B. Generalized error and erasure patterns 

In this subsection, we consider a generalization of the conventional error and erasure patterns under the same framework to 
make better use of the soft information. At each index of the RS codeword, beside erasing the symbol we can try to decode 
using not only the most likely symbol but also other ones as the hard decision (HD) symbol. To handle up to the / most 
likely symbols at each index i, we let = {0,1, ... ,1} and consider the following definition. 

Definition 3: (Generalized error patterns and erasure patterns) Consider a positive integer / < 2 q '. Let us define x" 6 Z" +1 as 
the generalized error pattern where, at index i, x,- = j implies that the j-th most likely symbol is correct for j 6 {1,2, .../}, 
and x, = implies none of the first / most likely symbols is correct. Let x" 6 Z" +1 be the generalized erasure pattern 
used for decoding where, at index i, x,- = j implies that the j-th most likely symbol is used as the hard-decision symbol 
for y 6 {1,2, ...,/}, and x,- = implies that an erasure is used at that index. 

For simplicity, we will refer to x" as the error pattern and x" as the erasure pattern like in the conventional case. Next, 
we also want to convert the condition (HJ to the form where d(x n ,x") is less than a fixed threshold. Proposition Q] is thereby 
generalized into the following theorem. 

Theorem 1: We choose the letter-by-letter distortion measure 8 : x Z/ + i 
of the (/ + 1) X (Z + 1) matrix 

/ 1 2 ... 2 2 \ 



defined by 5(x,x) = [A]** in terms 
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Using this, the condition ([]]) for a successful error-and-erasure decoding becomes 

d{x n ,x*)<n-k+\. 

Proof: The reasoning is similar to Proposition Q] using the fact that e = Y! s =oXsfi an d V = Yn=i L'=o s^t Xs,t where 
Xs,t - \{i € {1,2, . . . ,n} : x,- = s,x { = t\ for every s,t g Z, + 1 . ■ 
For each / = 1,2, . . . ,2 q , we will refer to this generalized case as mBM-/ decoding. 

Example 3: We consider the case mBM-2 decoding where / = 2. The distortion measure is given by following the matrix 




Here, at each codeword position, we consider the first and second most likely symbols as the two hard-decision choices like 
in the Chase-type decoding method proposed by Bellorado and Kavcic [7]. 



C. Proposed General Multiple-Decoding Algorithm 

In this section, we propose a general multiple-decoding algorithm for RS codes based on the rate-distortion approach. 
This general algorithm applies to not only multiple error-and-erasure decoding but also multiple-decoding of other decoding 
schemes that we will discuss later. The first step is designing a distortion measure that converts the condition for a single 
decoding to succeed to the form where distortion is less than a fixed threshold. After that, decoding proceeds as described 
below. 

• Phase I: Compute rate-distortion function. 

Step 1: Transmit x (say x = 1000) arbitrary test RS codewords, indexed by time t = 1,2, ... ,T, over the channel and compute 

a set of X 2 q x n matrices Pj where [Pj ]j ,■ = p~'\ t) is the probability of the y'-th most likely symbol at position i during 

(y) 

time t. 

Step 2: For each time f, obtain the matrix P^ from Pj through a permutation : {1,2, ... ,n] — > {1,2, ...,n} that 

sorts the probabilities p (l) in increasing order to indicate the reliability order of codeword positions. Take the entry-wise 

'.*»'(!) 

average of all X matrices P^ to get an average matrix P. 

Step 3: Compute the R-D function of a source sequence (error pattern) with probability of source letters derived from P and 
the designed distortion measure (see Section [TTT] and Section |V-B| > . Determine the point on the R-D curve that corresponds 
to a designated rate R along with the test-channel input-probability distribution vector q that achieves that point. 

• Phase II: Run actual decoder. 

Step 4: Based on the actual received signal sequence, compute Pi^li) an d determine the permutation a that gives the 
reliability order of codeword positions by sorting P/wn in increasing order. 

Step 5: Randomly generate a set of 2 R erasure patterns using the test-channel input-probability distribution vector q and 
permute the indices of each erasure pattern by the permutation (7 _1 . 

Step 6: Run multiple attempts of the corresponding decoding scheme (e.g. error-and erasure decoding) using the set of 
erasure patterns in Step 5 to produce a list of candidate codewords. 

Step 7: Use Maximum-Likelihood (ML) decoding to pick the best codeword on the list. 

III. Computing The Rate-Distortion Function 

In this section, we will present a numerical method to compute the R-D function and the test-channel input-probability 
distribution that achieves a specific point in the R-D curve. This probability distribution will be needed to randomly generate 
the set of erasure patterns in the general multiple-decoding algorithm that we have proposed. 

For an arbitrary discrete distortion measure, it can be difficult to compute the R-D function analytically. Fortunately, 
the Blahut-Arimoto (B-A) algorithm (see details in [12], [13]) gives an alternating minimization technique that efficiently 
computes the R-D function of a single discrete source. More precisely, given a parameter s < which represents the slope of 
the R — D curve at a specific point and an arbitrary all-positive initial test-channel input-probability distribution vector q^°\ 
the B-A algorithm shows us how to compute the rate-distortion point (R S ,D S ) by means of computing the test-channel 
input-probability distribution vector q* = lim,^oo and the test-channel transition probability matrix Q* = lim^oo Q that 
achieves that point. 

However, it is not straightforward to apply the B-A algorithm to compute the R-D for a discrete source sequence x" (an 
error pattern in our context) of n independent but non identical source components x,. In order to do that, we consider 
the group of source letters (ji,j2,---,jn) where ji 6 J as a super-source letter J G J", the group of reproduction let- 
ters (k\,lc2, . . . ,k„) where k, E K. as a super-reproduction letter K € KJ 1 , and the source sequence x" as a single source. For 
each super-source letter J, pj = Pr(x" = J) = IlJLi Pr(x; = ji) = FELi Ph follows from the independence of source components. 
While we could apply the B-A algorithm to this source directly, the complexity is a problem because the alphabet sizes 
for J and K become the super-alphabet sizes \ J\ n and \K.\ n respectively. Instead, we avoid this computational challenge 
by choosing the initial test-channel input-probability distribution so that it can be factorized into a product of n initial 
test-channel input-probability components, i.e. q^ =ni=i?[ '- Then, we see that this factorization rule still applies after 
every step of the iterative process. By doing this, for each parameter s we only need to compute the rate-distortion pair for 
each component (or index i) separately and sum them together. This is captured into the following theorem. 

Theorem 2: (Factored Blahut-Arimoto algorithm) Consider a discrete source sequence x" of n independent but non identical 
source components jt,. Given a parameter s < 0, the rate and the distortion for this source sequence are given by 

n n 

R s = andD. s = 

i=i i=i 

where the components Ri S and D, \ s are computed by the B-A algorithm with the parameter s. This pair of rate and 
distortion can be achieved by the corresponding test-channel input-probability distribution q^ = Pr(x" = K) = nf=i ft, where 
the component probability distribution q^ = Pr(i, = k{). 

Proof: See Appendix HI ■ 



IV. Multiple Algebraic Soft Decision Decoding (ASD) 

In this section, we analyze and design a distortion measure to convert the condition for successful ASD decoding to a 
suitable form so that we can apply the general multiple-decoding algorithm to ASD decoding. 

First, let us give a brief review on ASD decoding of RS codes. Given a set , j32, . . . , j3„} of n distinct elements in F2?. 
From each message polynomial f(X) = fo+fiX + . . . +/a_ , we can have a codeword c = (c\,C2,---,c„) by evaluating 
the message polynomial at {j3,}* =1 , i.e. <:,■ = for i= 1,2,. Consider a received vector r = (n,r2,...,r„), we can 
compute the a posteriori probability (APP) matrix P as follows. 

[P] Ai = pi j = p r (ci = aj\r) for 1 < i < n,l < j < 2 q . 

The ASD decoding as in [2] has the following main steps. 

1) Multiplicity Assignment: Use a particular multiplicity assignment scheme (MAS) to derive a 2 q x n multiplicity matrix, 
denoted as M, of non-negative integer entries from the APP matrix P. 

2) Interpolation: Construct a bivariate polynomial Q(X,Y) of minimum (l,k — 1) weighted degree that passes through 
each of the point Q3;, Of,-) with multiplicity m,j for i = 1,2, ... ,2 9 and j = 1,2, . . . 

3) Factorization: Find all polynomials f(X) of degree less than k such that Y —f(X) is a factor of Q(X,Y) and re-evaluate 
these polynomials to form a list of candidate codewords. 

In this paper, we denote m =maXijmij as the maximum multiplicity. Intuitively, higher multiplicity should be put on more 
likely symbols. Increasing m generally gives rise to the performance of ASD decoding. However, one of the drawbacks of 
ASD decoding is that its decoding complexity is roughly 0(m 6 ) which sharply increases with m. Thus, in this section we 
will work with small m to keep the complexity affordable. 

One of the main contributions of [2] is to offer a condition for successful ASD decoding represented in terms of two 
quantities specified as the score and the cost as follows. 

Definition 4: The score Sm(c) with respect to a codeword c and a multiplicity matrix M is defined as 5m(c) = Ylj=i m [c-],j 
where [cj] = i such that a, = cj. The cost Cm of a multiplicity matrix M is defined as Cm = j Y%=\ £j=i m ij( m i,j + 

Condition 2: (ASD decoding threshold, see [2], [14], [15]). The transmitted codeword will be on the list if 

T(S M ) > C M where T(S M ) = (a + 1) S M -^i k ~ ^ 

for any a eN such thata(/t- 1) < S M < (a+l)(k-l). (4) 
To match the general framework, the ASD decoding threshold (or condition for successful ASD decoding) should be converted 
to the form where the distortion is smaller than a fixed threshold. 



A. Bit-level ASD case 

In this subsection, we consider multiple trials of ASD decoding using bit-level erasure patterns. A bit-level error pat- 
tern b N G and a bit-level erasure pattern b N £ has length N = nxq since each symbol has q bits. Similar to Definition 
Q] of a conventional error pattern and a conventional erasure pattern, £>, = in a bit-level error pattern implies a bit-level 
error occurs and bj in a bit-level erasure pattern implies that a bit-level erasure occurs. 

From each bit-level erasure pattern we can specify entries of the multiplicity matrix M using the bit-level MAS proposed 
in [14] as follows: for each codeword position, assign multiplicity 2 to the symbol with no bit erased, assign multiplicity 
1 to each of the two candidate symbols if there is 1 bit erased, and assign multiplicity zero to all the symbols if there 
are > 2 bits erased. All the other entries are zeros by default. This MAS has a larger decoding region compared to the 
conventional error-and-erasure decoding scheme. 

Condition 3: (Bit-level ASD decoding threshold, see [14]) For RS codes of rate | > | + i ASD decoding using the 
proposed bit-level MAS will succeed (i.e. the transmitted codeword is on the list) if 

3v b + e b < Un-k+l) (5) 

where e b is the number of bit-level erasures and Vt, is the number of bit-level errors in unerased locations. 

We can choose an appropriate distortion measure according to the following proposition which is a natural extension of 

Proposition Q] in the symbol level. 

Proposition 2: If we choose the bit-level letter-by-letter distortion measure 8 : Z2 x Z2 — > M>o as follows 

5(0,0) = 1 5(0,1) = 3 

5(1,0) = 1 5(1,1) =0 K) 

then the condition (0 becomes 

d{b N ,h N ) < l -{n-k+\). (7) 
Proof: The proof uses the same reasoning as the proof of Proposition Q] ■ 
Remark 1: We refer the the multiple-decoding of bit-level ASD as m-b-ASD. 



B. Symbol-level ASD case 

In this subsection, we try to convert the condition for successful ASD decoding in general to the form that suits our goal. 
We will also determine which multiplicity assignment schemes allow us to do so. 

Definition 5: (Multiplicity type) For some codeword position, let us assign multiplicity mj to the j-th most likely symbol 
for j= 1,2, ... ,/ where I < 2 q . The remaining entries in the column are zeros by default. We call the sequence, (m\ ,m2, . . . 
the column multiplicity type for "top-/" decoding. 

First, we notice that a choice of multiplicity types in ASD decoding at each codeword position has the similar meaning to a 
choice of erasure decisions in the conventional error-and-erasure decoding. However, in ASD decoding we are more flexible 
and may have more types of erasures. For example, assigning multiplicity zero to all the symbols (all-zero multiplicity type) 
at codeword position i corresponds to the case when we have a complete erasure at that position. Assigning the maximum 
multiplicity m to one symbol corresponds to the case when we choose that symbol as the hard-decision one. Hence with 
some abuse of terminology, we also use the term (generalized) erasure pattern x" for the multiplicity assignment scheme in 
the ASD context. Each erasure-letter jc, gives the multiplicity type for the corresponding column of the multiplicity matrix M. 

Definition 6: (Error and erasure patterns for ASD decoding) Consider a MAS with z multiplicity types. Let x" E { 1 , 2 . . . , z} n 
be an erasure pattern where, at index i, Xj — j implies that multiplicity type j is used at column i of the multiplicity matrix M. 
Notice that the definition of an error pattern x" 6 in Definition [3] applies unchanged here. 

Rate-distortion theory gives us the intuition that in general the more multiplicity types (erasure choices) we have, the better 
performance of multiple ASD decoding we achieve as n becomes large. Thus, we want to find as many as possible multiplicity 
types for "top-/" that allow us to convert condition for successful ASD decoding to the correct form. 

Example 4: Choosing m — 2, for example, gives four column multiplicity types for "top-2" decoding as follows: the first 
is (2,0) where we assign multiplicity 2 to the most likely symbol y^\, the second is (1, 1) where we assign equal multiplicity 
1 to the first and second most likely symbols yi t \ and y,-^, the third is (0,2) where we assign multiplicity 2 to the second 
most likely symbol y,-2, and the fourth is (0,0) where we assign multiplicity zero to all the symbols at index i (i.e. the z'-th 
column of M is an all-zero column). As a corollary of Theorem [3] below, the distortion matrix that converts (0]i to the correct 
form for this case is 

/ 2 5/3 2 1 \ 

A = 2/3 2 1 . 
\ 2 5/3 1 J 

The following definition and theorem provide a set of allowable multiplicity types that converts the condition for successful 
ASD decoding into the form where distortion is less than a fixed threshold. 

Definition 7: The set of allowable multiplicity types for "top-/" decoding with maximum multiplicity m is defined to 

A f ' ' 1 

A(m,l) = < (mi,m2, . . . ,m{) : V m r < m and V m r (m — m r ) < (m+l)(\{r :m r ^0}\ — 1) min m r > . (8) 

{ r=l r=l Km '^ J 

Taking the elements of this set in an arbitrary order, we let the j-th multiplicity type in the allowable set be (mj i , m/,2; . . . m ,■ /). 

Example 5: ,4.(3,2) consists of all permutations of (3,0), (2, 1), (1, 1), (0,0). Meanwhile, ,4.(2,2) comprises all the per- 
mutations of (2,0), (1, 1), (0,0) and we refer to the multiple ASD decoding algorithm using this set of multiplicity types as 
mASD-2. .4.(3,3) consists of all the permutations of (3,0,0), (0,0,0), (1, 1,0), (2, 1,0), (1, 1, 1) and this case is referred as 
mASD-3. We also consider another case called mASD-2a that uses the set of multiplicity types {(2,0), (1, 1), (0,0)}. 

Theorem 3: Let z = \A(m,l)\ be the number of multiplicity types in a MAS for "top-/" decoding with maximum 
multiplicity m. Let 8 : Z/ + i x Z z+ i \ {0} — > K>o be a letter-by-letter distortion measure defined by 8(x,x) — [A] v ..{, where 
Ais the (/ + 1) x z matrix 

/ Ml ■■■ Hz \ 

jX l —2m [A / m [l 2 — 2m2,l/m ... jl z — 2m z,l/m 
^_ ll\— 2m \,l/m H2 — 2m 22/m ... ll z — 2m z,2/m 

\ Hl~ 2m u/m H2 — 2m 2,l/m ... pL z — 2m z,l/m J 

with u, = 1 +ELi mt ' r feV ) ■ Then, the condition © for successful ASD decoding of a RS code with rate - > - + 

i*r— l m{m-\-\) ' * — ' & n — n 

(m+iKm+2) is equivalent to 

d(x n X) <n-k+\. (9) 
Proof: [Sketch of proof] (See details in [16]) Let S and C be the score and cost of the multiplicity assignment. First, we 
show that S > + 1 (k — 1) in © implies that (jn — ^(a+i) ) n — 3 ~ ) — 0- Combining this inequality with the high-rate 



'We use the convention that mm rmr ^m r = if {r : m r 0} = 0. 



constraint in Theorem [3] implies that a < m + 1. From |@), we also know that (a+ 1)5 — C > ja(a + l)(k — 1) > jaS and 
this implies that 2C < (a + 2)5. But, the conditions of the theorem can also be used to show that 2C > (m+ 1)5. Combining 
this with 2C < (a + 2)5 gives a contradiction unless a >m— 1. Thus, we conclude that a = m. 

Therefore, the condition in is equivalent to 5 > + j{k — 1) because a (k — 1) < 5 is a consequence of a = m 
and 5 < (m + 1) (k — 1) is satisfied by the high-rate constraint. Finally, one can show that 5 > + j(k—l) is equivalent 
to d(x",x") < n — k+ 1 with the chosen distortion matrix. ■ 

Remark 2: For a fixed m, the size of „4(m,/) is maximized when / = m. Multiplicity types (0, . . . ,0), (1, . . . , 1) and any 
permutation of (m,0, . . . ,0),( [f J , [_tJ >0, • • • ,0) are always in the allowable set A(m,m). " * ' " ^ 

V. Some Extensions and Generalizations 
A. Erasure patterns using covering codes 

The R-D framework we use is most suitable when n — > °°. For a finite n, the random coding approach may have problems 
with only a few LRPs. We can instead use good covering codes to handle these LRPs. In the scope of covering problems, 
one can use an /-ary f c -covering code (e.g. a perfect Hamming or Golay code) with covering radius t c to cover the whole 
space of /-ary vectors of the same length. The covering may still work well if the distortion measure is close to, but not 
exactly equal to the Hamming distortion. 

In order take care of up to the / most likely symbols at each of the n p LRPs of an (n,k) RS, we consider an (n c ,k c ) l- 
ary f t -covering code whose codeword alphabet is Z/ + j \ {0} = {1,2,... ,/}. Then, we give a definition of the (generalized) 
error patterns and erasure patterns for this case. In order to draw similarities between this case and the previous cases, we 
still use the terminology "generalized erasure pattern" and shorten it to erasure pattern even if error-only decoding is used. 
For error-only decoding, Condition Q] for successful decoding becomes 

V<~(n-k+l). 

Definition 8: (Error and erasure patterns for error-only decoding) Let us define x n € Z" +1 = {0,1,...,/}" as an error 
pattern where, at index i, jc, = j implies that the j-th most likely symbol is correct for /' 6 {1,2, .../}, and x, = implies 
none of the first / most likely symbols is correct. Let x" G {1,2, ...,/}" be an erasure pattern where, at index i, i, = / implies 
that the j-th most likely symbol is chosen as the hard-decision symbol for j E {1,2, . . . ,/}. 

Proposition 3: If we choose the letter-by-letter distortion measure 8 : x \ {0} — * M>o defined by 8(x,x) = 
[A) X £ in terms of the (/ + 1) x / matrix 

/ 1 1 ... 1 \ 

1 ... 1 

1 ... 1 



V 1 1 

then the condition for successful error-only decoding then becomes 



(10) 



d(x n ,f)< Un-k+l). (11) 

Proof: It follows directly from d(x",x") — Yi=i L'=o s^tXs.t = v. ■ 
Remark 3: If we delete the first row which corresponds to the case where none of the first / most likely symbols is correct 
then the distortion measure is exactly the Hamming distortion. 

Split covering approach:: We can break an error pattern x" into two sub-error patterns x 1 ^" = x a ^x a ^ . . . x a ^ of n c 
least reliable positions and _] 1 Afff/>1 = x c r nc+ \\ ...x a r n -\ of n — n c most reliable positions. Similarly, we can break an erasure 
pattern x" into two sub-erasure patterns x " 11 = x a n\x a (2) ■ ■ -Xo{n c ) an d = ^o-(„ c +i) • ■ -^a(n)- L et z » c ^ e tne number of 

positions in the n c LRPs where none of the first / most likely symbols is correct, or z„ c = |{/= 1,2, ...,n c : x a ^ =0}|. If 
we assign the set of all sub-error patterns jr RPs to be an (n c ,k c ) f c -covering code then d(x LR ,x LRPs ) < t c + z np because this 
covering code has covering radius t c . Since d(x n ,x") = d(x LRPs 1 x LRPs ) +d(x MRPs : x MRPs ), in order to increase the probability 
that the condition (fTTT i is satisfied we want to make d(x MRPs ,x MRPs ) as small as possible by the use of the R-D approach. 
The following proposition summarizes how to generate a set of 2 R erasure patterns for multiple runs of error-only decoding. 

Proposition 4: In each erasure pattern, the letter sequence at n c LRPs is set to be a codeword of an (n c ,k c ) /-ary t c — covering 
code. The letter sequence of the remaining n — n t MRPs is generated randomly by the R-D method (see Section Ill-Q with 
rate Rmrps =R — k c log 2 l and the distortion measure in (flOt . Since this covering code has l kc codewords, the total rate 
is Rmrps + log 2 l kc =R- 

Example 6: For a (7,4,3) binary Hamming code which has covering radius t c = 1, we take care of the 2 most likely 
symbols at each of the 7 LRPs. We see that 1001001 is a codeword of this Hamming code and then form erasure 



patterns 1001001x8^9 .. .x„ with assumption that the positions are written in increasing reliability order. The 2 S ~ 4 sub- 
erasure patterns xgxg ...x n are generated randomly using the R-D approach with rate (R — 4). 

Remark 4: While it also makes sense to use a covering codes for the n c LRPs of the erasure patterns and set the the rest 
to be letter 1 (i.e. chose the most likely symbol as the hard-decision), our simulation results shows that the performance can 
be improved by using a combination of covering codes and random (i.e., generated by the R-D approach) codes. 



B. Closed form rate-distortion functions 

For some simple distortion measures, we can compute the R-D functions analytically in closed form. First, we observe 
an error pattern as a sequence of independent but non-identical random sources. Then, we compute the component R-D 
functions at each index of the sequence and use convex optimization techniques to allocate the total rate and distortion to 
various components. This method converges to the solution faster than the numerical method in Section [HI] The following 
two theorems describe how to compute the R-D functions for the simple distortion measures of Proposition [TJ and [2] 

Theorem 4: (Conventional error-and-erasure decoding) Let p, = Pr(jt; = 1), the overall rate-distortion function is given 
by R(D) = YJl=\ \flipi) ~ H(Di)\ + where D, = D, + pi — 1 and 5,- can be found be a reverse water-filling procedure: 



Di = 



X if A < mini/?,-, 1 — p/} 

min{p/, 1 — pi} otherwise 



where X should be chosen so that YIi=\^i — D + YH=iPi ~ n - The R(D) function can be achieved by the test-channel 
input-probability distribution 

a n i ~ n \ 1 — Pi ~ D i A , „ . pf—Di 

qo,i = rr{Xi = 0) = = — and q\j = Pr(x, = 1 j — 



1-2D/ l-2Dj 
Proof: [Sketch of proof] (See [16] for details) With the distortion measure in (0, we follow the method in [17] to 

compute the rate-distortion function component Ri(Dj) — [H(pi) — H(Di + p\ — 1)] + and the test-channel input-probability 
distribution q§ \ = l^ip ^+D ) anc ' = 3-2(p D +D-) ^ or eacn m d ex Then, one can show that the optimal allocation of rate 
and distortion to the various components is given by a reverse-water filling procedure like in [18]. ■ 
Theorem 5: (Bit-level ASD case in Proposition [2]) The overall rate-distortion function in this case is given by R(D) = 
E£Lj [Ri(X)} + where R t (X) = H(pi) -H{ ,}t\i ) + (Pi - ttttpMtxt) and the distortion component D, is given by 



D . = J i+X+X± ~P<— vR l {A)>U 
I min{ 1,3(1 —/>/)} otherwise 

where X G (0,1) should be chosen so that = D. The R(D) function can be achieved by the test-channel input- 

probability distribution 

a p /T n , (1+A)- Pi -(1+A + A 2 ) . Pl (l+X + X 2 )-X(l+X) 
q ifi = Pr(bj = 0) = ^ J '- and q iA = Pr(Z>/ = 1) = — i — ^ — -■ 

Proof: [Sketch of proof] (See [16] for details) With the distortion measure in ©, using the method in [17] we can 
compute the rate-distortion function component Rj(Xi) — H(pi) —H( j^^^ ) + (pi — 1 ^ + < i2 )H( ) where Xj is a Lagrange 

multiplier such that D\ — -j^^r — Pi l \+x' ^ or eacn m d ex Then, the Kuhn-Tucker conditions define the the overall rate 
allocation. ■ 



VI. Simulation results 

Using simulations, we consider the performance of the (255,239) RS code over an AWGN channel with BPSK as the 
modulation format. The mBM-1 curve corresponds to standard error-and-erasure BM decoding with multiple erasure patterns. 
For I > 1, the mBM-/ curves correspond to error-and-erasure BM decoding with multiple decoding trials using both erasures 
and top-/ symbols. The mASD-m curves correspond to multiple ASD decoding trials with maximum multiplicity m. The 
number of trial decoding patterns is 2 R where R is denoted in parentheses in each algorithm's acronym (e.g., m-BM-2(ll) 
uses R = 1 1). 

Fig. [TJ shows the R-D curves for various algorithms at Ey/N =5.2 dB. For this code, the fixed threshold for decoding is 
D = n — k+l = 17. Therefore, one might expect that algorithms whose average distortion is less than 17 should have a frame 
error rate (FER) less than i. The R-D curve allows one to estimate the number of decoding patterns required to achieve this 
FER. Conventional BM decoding is very similar to mBM-1 decoding at rate 0. Notice that the mBM-1 algorithm at rate 
0, which is very similar to conventional BM decoding, has an expected distortion of roughly 24. For this reason, the FER 
on conventional decoding is close to 1. The R-D curve tells us that trying roughly 2 16 (i.e., R = 16) erasure patterns would 
reduce the FER to roughly j because this is where the distortion drops down to 17. Likewise, the mBM-2(ll) algorithm 
has an expected distortion of less than 14. So we expect (and our simulations confirm) that the FER should be less than 




Distortion D E b /N o (dB) 



Fig. 1 . A realization of R-D curves at E h /N = 5.2dB for various decoding Fig 2 . Performance of various decoding algorithms for the (255,239) RS 
algorithms for the (255,239) RS code over an AWGN channel. code over an AWGN channel 



|. One weakness of this approach is that the R-D describes only the average distortion and does not directly consider the 
probability that the distortion is greater than 17. Still, we can make the following observations from the R-D curve. Even 
at low rates (e.g., R > 4), we see that the distortion D achieved by mBM-2 is roughly the same as mBM-3, mASD-2, and 
mASD-3 but smaller than mASD-2a and mBM-1. This implies that mBM-2 is no worse than the more complicated ASD 
based approaches for a wide range of rates (i.e., 4 < R < 35). 

The FER of various algorithms can be seen in Fig. [2] The focus on R = 1 1 allows us to make fair comparisons with 
SED(12,12). With the same number of decoding trials, mBM-2(ll) outperforms SED(12,12) by 0.3 dB at an FER= 1(T 4 . 
Even mBM-2(7), with many fewer decoding trials, outperforms both SED(12,12) and the KV algorithm with m = °°. Among 
all our proposed algorithms with rate R= 11, the mBM-HM74(l 1) achieves the best performance. This algorithm uses the 
Hamming (7,4) covering code for the 7 LRPs and the R-D approach for the remaining codeword positions. Meanwhile, 
small differences in the performance between mBM-2(ll), mBM-3(ll), mASD-2(ll), and mASD-3(ll) suggest that: (i) 
taking care of the 2 most likely symbols at each codeword position is good enough for multiple decoding of high-rate RS 
code and (ii) multiple runs of error-and-erasure decoding is almost as good as multiple runs of ASD decoding. Recall that 
this result is also correctly predicted by the R-D analysis. Moreover, it is quite reasonable since we know that the gain of 
GS decoding, with infinite multiplicity, over the BM algorithm is negligible for high-rate RS codes. To compare with the 
LCC(T7 = 4) Chase-type approach used in [7], we also consider the mBM-HM74(4) algorithm, which uses the Hamming 
(7,4) covering codes for the 7 LRPs and the hard decision pattern for the remaining codeword positions. This shows that the 
covering code achieves better performance with the same number (2 4 ) decoding attempts. The comparison is not entirely 
fair, however, because of their low-complexity approach to multiple decoding. We believe, nevertheless, that their technique 
can be generalized to covering codes. 

VII. Conclusion 

A rate-distortion approach is proposed as a unified framework to analyze multiple decoding trials, with various algorithms, 
of RS codes in terms of performance and complexity. A connection is made between the complexity and performance (in 
some asymptotic sense) of these multiple-decoding algorithms and the rate and distortion of an associated R-D problem. 
Covering codes are also combined with the rate-distortion approach to mitigate the suboptimality of random codes when the 
effective block-length is not large. As part of this analysis, we also present numerical and analytical computations of the 
rate-distortion function for sequences of independent but non-identical sources. Finally, the simulation results show that our 
proposed algorithms based on the R-D approach achieve a better performance-versus-complexity trade-off than previously 
proposed algorithms. One key result is that, for high-rate RS codes, multiple-decoding using the standard BM algorithm is 
as good as multiple-decoding using more complex ASD algorithms. 

In this paper, we only discuss the rate-distortion approach to solve the problem in (01. However, the performance can 
be further improved by focusing on the rate-distortion error-exponent. This allows us to approximately solve the covering 
problem for finite n rather than just as n — > °°. The complexity of multiple decoding can also be decreased by using clever 
techniques to lower the complexity per decoding trial (e.g., [7]). We will address these two improvements in a future paper. 



Appendix I 
Proof of Theorem[2] 

First, let us recall that for each source component Xf, the B-A algorithm computes the R-D pair in the following steps: 

1) Choose an arbitrary all-positive test-channel input-probability distribution vector q^°\ 

2) Iterate the following steps at t = 1,2,... 

where Qk j \j i = Pr(i; = ki\xj — ji) is the transition probability. It is shown by B-A that qf} — > q* h and G^L. — * G£.|y. as r — * 

oo 

The rate and distortion can be computed by = Zj t Efr P f Q% i ,-. lo g y , p .' % . and L"=i D i,s = Lj i Lk i Pj i Qt\j.Pj i k i 

Now, we will prove Theorem |2J Since the input-distribution vector of the test channel is an arbitrary all-positive vector, 

we choose q^ 1 so that it can be factorized as follows q^ = n*=i ■ 

Suppose after step f, we have qg — ri/Li?^ then by the iterative computing process in the B-A algorithm we have 

Ea: « jt ex P ( s °JK ) E*! • • • Lk„ 11"= 1 1 k - e*P(s5 jik . ) i= i ^/ exp(i 5^. ) ,= i 

«r iJ - i>< = i . . - ni>A<4 = rte +1) 

•/ ji in i=i !=i y'i '=i 

where pj = Pi(x" = J) = n?=i Pr^i = ji) = n?=i Pk follows from the independence of source components. 
Hence, by induction we know the factorization rule still applies after every step t in the process. Therefore, we have 

q* K = Hm 9 « = limfl^' = f\jimq^ = f^, and Q* K{J = lim Q% = UmJlG^, = flj^j, = fl^l, 

i= 1 i= 1 i= 1 /= 1 i= 1 /= 1 

and then we can show that 

J K LjPJU K \j h,...,j„k\ k n i=\ i=l t-).t>iX'l, j. i=\ j t ki LjiPjtejfftj. j=1 

d s =LI>gv*= E E n^G^ Ui lPM.=III^G^. Ui p^. = lA>. 

V K ji,...,j n ki,...,k n i=1 i=l i=l 7i fci i=l 
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